首页> 外文OA文献 >Explainable user clustering in short text streams
【2h】

Explainable user clustering in short text streams

机译:短文本流中可解释的用户集群

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

User clustering has been studied from different angles: behavior-based, to identify similar browsing or search patterns, and content-based, to identify shared interests. Once user clusters have been found, they can be used for recommendation and personalization. So far, content-based user clustering has mostly focused on static sets of relatively long documents. Given the dynamic nature of social media, there is a need to dynamically cluster users in the context of short text streams. User clustering in this setting is more challenging than in the case of long documents as it is difficult to capture the users' dynamic topic distributions in sparse data settings. To address this problem, we propose a dynamic user clustering topic model (or UCT for short). UCT adaptively tracks changes of each user's time-varying topic distribution based both on the short texts the user posts during a given time period and on the previously estimated distribution. To infer changes, we propose a Gibbs sampling algorithm where a set of word-pairs from each user is constructed for sampling. The clustering results are explainable and human-understandable, in contrast to many other clustering algorithms. For evaluation purposes, we work with a dataset consisting of users and tweets from each user. Experimental results demonstrate the effectiveness of our proposed clustering model compared to state-of-the-art baselines.
机译:已从不同角度研究了用户聚类:基于行为来标识相似的浏览或搜索模式,以及基于内容来标识共享兴趣。一旦找到用户集群,就可以将其用于推荐和个性化。到目前为止,基于内容的用户聚类主要集中在相对较长文档的静态集合上。考虑到社交媒体的动态性质,需要在短文本流的上下文中动态地将用户聚类。与在长文档的情况下相比,此设置中的用户聚类更具挑战性,因为很难在稀疏数据设置中捕获用户的动态主题分布。为了解决这个问题,我们提出了一个动态的用户集群主题模型(简称UCT)。 UCT基于用户在给定时间段内发布的短文本和先前估计的分布,自适应地跟踪每个用户的时变主题分布的变化。为了推断变化,我们提出了一种Gibbs采样算法,其中构造了来自每个用户的一组单词对以进行采样。与许多其他聚类算法相比,聚类结果是可解释的并且是人类可以理解的。为了进行评估,我们使用由用户和每个用户的推文组成的数据集。实验结果证明了我们提出的聚类模型与最新基准相比的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号